This notebook contains first bits and pieces of the yet to be developed model correlating climate/environmental factors with conflict occurrence.
import conflict_model
import pandas as pd
import geopandas as gpd
from configparser import RawConfigParser
import matplotlib.pyplot as plt
import numpy as np
import datetime
import rasterstats as rstats
import xarray as xr
import rasterio as rio
import os, sys
conflict_model.utils.show_versions()
Geopandas versions lower than 0.7.0 do not yet have the clip function. The notebook will thus not work with these versions.
if gpd.__version__ < '0.7.0':
sys.exit('please upgrade geopandas to version 0.7.0, your current version is {}'.format(gpd.__version__))
In this file all the settings for the analysis are defined. By 'parsing' it, all values are read for different sections. This is a simple way to make the code independent of the input data and settings.
settings_file = r'../data/run_setting.cfg'
config = RawConfigParser(allow_no_value=True)
config.read(settings_file)
#out_dir
out_dir = config.get('general','output_dir')
if not os.path.isdir(out_dir):
os.makedirs(out_dir)
print('for the record, saving output to folder {}'.format(out_dir) + os.linesep)
gdf = conflict_model.utils.get_geodataframe(config)
conflict_gdf, extent_gdf = conflict_model.selection.select(gdf, config)
def conflict_in_year_bool(conflict_gdf, extent_gdf, config, sim_year, out_dir, saving_plots=False, showing_plots=False):
"""Determins per year the number of fatalities per country and derivates a boolean value whether conflict has occured in one year in one country or not.
Arguments:
conflict_gdf {geodataframe}: geodataframe containing final selection of georeferenced conflicts
extent_gdf {geodataframe}: geodataframe containing country polygons of selected extent
config {configuration}: parsed configuration settings
Keyword Arguments:
plotting {bool}: whether or not to make annual plots of boolean conflict and conflict fatalities (default: False)
"""
print('determining whether a conflict took place or not...')
# select the entries which occured in this year
temp_sel_year = conflict_gdf.loc[conflict_gdf.year == sim_year]
# merge this selection with the continent data
data_merged = gpd.sjoin(temp_sel_year, extent_gdf, how="inner", op='within')
# per country the annual total fatalities are computed and stored in a separate column
annual_fatalities_sum = pd.merge(extent_gdf,
data_merged['best'].groupby(data_merged['watprovID']).sum().\
to_frame().rename(columns={"best": "best_SUM"}),
on='watprovID')
print(data_merged['best'].groupby(data_merged['watprovID']).sum().\
to_frame().rename(columns={"best": "best_SUM"}))
# if the fatalities exceed 0.0, this entry is assigned a value 1, otherwise 0
annual_fatalities_sum['conflict_bool'] = np.where(annual_fatalities_sum['best_SUM']>0.0, 1, 0)
print('...DONE' + os.linesep)
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(20,10), sharey=True)
annual_fatalities_sum.plot(ax=ax1,column='conflict_bool',
vmin=0,
vmax=2,
categorical=True,
legend=True)
temp_sel_year.plot(ax=ax1, legend=True, color='r', label='PRIO/UCDP events')
extent_gdf.boundary.plot(ax=ax1,
color='0.5',
linestyle=':',
label='water province borders')
ax1.set_xlim(extent_gdf.total_bounds[0]-1, extent_gdf.total_bounds[2]+1)
ax1.set_ylim(extent_gdf.total_bounds[1]-1, extent_gdf.total_bounds[3]+1)
ax1.set_title('conflict_bool ' + str(sim_year))
ax1.legend()
annual_fatalities_sum.plot(ax=ax2,
column='best_SUM',
vmin=0,
vmax=1500,
legend=True,
legend_kwds={'label': "FATALITIES_SUM",
'orientation': "vertical"},)
extent_gdf.boundary.plot(ax=ax2,
color='0.5',
linestyle=':')
ax2.set_xlim(extent_gdf.total_bounds[0]-1, extent_gdf.total_bounds[2]+1)
ax2.set_ylim(extent_gdf.total_bounds[1]-1, extent_gdf.total_bounds[3]+1)
ax2.set_title('aggr. fatalities ' + str(sim_year))
fn_out = os.path.join(out_dir, 'boolean_conflict_map_' + str(sim_year) + '.png')
if saving_plots:
plt.savefig(fn_out, dpi=300)
if not showing_plots:
plt.close()
return temp_sel_year, data_merged, annual_fatalities_sum
In a first step, we want to know in which countries there was conflict or not. TO that end, we first accumulate the number of fatalities per country and use this as proxy whether there was a conlfict or not (guess there is a rather strong like...).
for sim_year in np.arange(config.getint('settings', 'y_start'), config.getint('settings', 'y_end'), 1):
print('entering year {}'.format(sim_year) + os.linesep)
temp_sel_year, data_merged, extent_waterProvinces_with_boolFatalities = conflict_in_year_bool(conflict_gdf,
extent_gdf,
config,
sim_year,
out_dir,
showing_plots=True,
saving_plots=False)
# conflict_gdf_perYear,
# extent_conflict_merged,
# fatalities_per_waterProvince,
# extent_waterProvinces_with_boolFatalities = conflict_model.analysis.conflict_in_year_bool(conflict_gdf,
# extent_gdf,
# config,
# sim_year,
# out_dir,
# showing_plots=True,
# saving_plots=False)
GDP_PPP_gdf = conflict_model.env_vars_nc.rasterstats_GDP_PPP(extent_waterProvinces_with_boolFatalities,
extent_gdf,
config,
sim_year,
out_dir,
showing_plots=True,
saving_plots=False)
extent_waterProvinces_with_boolFatalities.head()
So the master dataframe with ALL conflicts reported looks like this...
conflict_gdf.head()
...and has in total that many entries:
len(conflict_gdf)
In the last year of the simulation period, a subset is created of all conflict entries which started in this year. It thus has the same columns as the master dataframe...
temp_sel_year.head()
...but of course less entries, namely only that many entries:
len(temp_sel_year)
This sub-set dataframe is then merged with the dataframe containing the geometry of the water provinces. Each conflict entry in the sub-set dataframe is assigned the water province (plus water province data) where the conflict took place.
data_merged.head()
Logically, this merged dataframe has as many entries as the sub-set dataframe but more columns, i.e. those added from the extent dataframe:
len(data_merged)
Per water province the number of fatalities is summed and added to a new column. Also, a boolean value of 1 (i.e. True) is assigned to each water province, indicating that one or more conflict took place in this province. The structure of the dataframe (besides the two added columns) is coming from the geometry/extent dataframe of the water provinces and NOT from the dataframe containing the conflict data.
extent_waterProvinces_with_boolFatalities.head()
Since one or more conflicts can occur in one water province (which is very likely), only a few water provinces are found in this dataframe. This is also because the code currently drops all water provinces which have not seen conflict in this year.
len(extent_waterProvinces_with_boolFatalities)
As a result, the zonal statistics are only stored for those water provinces.
NOTE: if you use the geometry/extent dataframe instead as argument, zonal statistics are computed for all water provinces, but then the columns containing (boolean) fatalities are lost.
GDP_PPP_gdf.head()
len(GDP_PPP_gdf)